survey question
An Analysis of Large Language Models for Simulating User Responses in Surveys
Yu, Ziyun, Zhou, Yiru, Zhao, Chen, Wen, Hongyi
Using Large Language Models (LLMs) to simulate user opinions has received growing attention. Yet LLMs, especially trained with reinforcement learning from human feedback (RLHF), are known to exhibit biases toward dominant viewpoints, raising concerns about their ability to represent users from diverse demographic and cultural backgrounds. In this work, we examine the extent to which LLMs can simulate human responses to cross-domain survey questions through direct prompting and chain-of-thought prompting. We further propose a claim diversification method CLAIMSIM, which elicits viewpoints from LLM parametric knowledge as contextual input. Experiments on the survey question answering task indicate that, while CLAIMSIM produces more diverse responses, both approaches struggle to accurately simulate users. Further analysis reveals two key limitations: (1) LLMs tend to maintain fixed viewpoints across varying demographic features, and generate single-perspective claims; and (2) when presented with conflicting claims, LLMs struggle to reason over nuanced differences among demographic features, limiting their ability to adapt responses to specific user profiles.
- Europe > Western Europe (0.04)
- Asia > China > Shanghai > Shanghai (0.04)
- North America > United States (0.04)
- (2 more...)
- Questionnaire & Opinion Survey (1.00)
- Research Report > New Finding (0.93)
MASim: Multilingual Agent-Based Simulation for Social Science
Zhang, Xuan, Zhang, Wenxuan, Wang, Anxu, Ng, See-Kiong, Deng, Yang
Multi-agent role-playing has recently shown promise for studying social behavior with language agents, but existing simulations are mostly monolingual and fail to model cross-lingual interaction, an essential property of real societies. We introduce MASim, the first multilingual agent-based simulation framework that supports multi-turn interaction among generative agents with diverse sociolinguistic profiles. MASim offers two key analyses: (i) global public opinion modeling, by simulating how attitudes toward open-domain hypotheses evolve across languages and cultures, and (ii) media influence and information diffusion, via autonomous news agents that dynamically generate content and shape user behavior. To instantiate simulations, we construct the MAPS benchmark, which combines survey questions and demographic personas drawn from global population distributions. Experiments on calibration, sensitivity, consistency, and cultural case studies show that MASim reproduces sociocultural phenomena and highlights the importance of multilingual simulation for scalable, controlled computational social science.
- Asia > South Korea (0.15)
- Asia > Japan (0.05)
- South America > Peru (0.05)
- (11 more...)
- Questionnaire & Opinion Survey (1.00)
- Personal (0.93)
- Research Report > New Finding (0.67)
- Media > News (1.00)
- Law (1.00)
- Health & Medicine (1.00)
- (5 more...)
- North America > United States (0.94)
- Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.14)
- Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
- (3 more...)
- Research Report > New Finding (1.00)
- Questionnaire & Opinion Survey (1.00)
- Research Report > Experimental Study (0.93)
- North America > United States (0.94)
- Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.14)
- Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
- (3 more...)
- Research Report > New Finding (1.00)
- Questionnaire & Opinion Survey (1.00)
- Research Report > Experimental Study (0.93)
Beyond Demographics: Enhancing Cultural Value Survey Simulation with Multi-Stage Personality-Driven Cognitive Reasoning
Liu, Haijiang, Li, Qiyuan, Gao, Chao, Cao, Yong, Xu, Xiangyu, Wu, Xun, Hershcovich, Daniel, Gu, Jinguang
Introducing MARK, the Multi-stAge Reasoning frameworK for cultural value survey response simulation, designed to enhance the accuracy, steerability, and interpretability of large language models in this task. The system is inspired by the type dynamics theory in the MBTI psychological framework for personality research. It effectively predicts and utilizes human demographic information for simulation: life-situational stress analysis, group-level personality prediction, and self-weighted cognitive imitation. Experiments on the World Values Survey show that MARK outperforms existing baselines by 10% accuracy and reduces the divergence between model predictions and human preferences. This highlights the potential of our framework to improve zero-shot personalization and help social scientists interpret model predictions.
- Asia > China > Hubei Province > Wuhan (0.04)
- Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.04)
- Asia > Singapore (0.04)
- (8 more...)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Cognitive Science (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
ISCA: A Framework for Interview-Style Conversational Agents
Welch, Charles, Lahnala, Allison, Varadarajan, Vasudha, Flek, Lucie, Mihalcea, Rada, Boyd, J. Lomax, Sedoc, João
We present a low-compute non-generative system for implementing interview-style conversational agents which can be used to facilitate qualitative data collection through controlled interactions and quantitative analysis. Use cases include applications to tracking attitude formation or behavior change, where control or standardization over the conversational flow is desired. We show how our system can be easily adjusted through an online administrative panel to create new interviews, making the tool accessible without coding. Two case studies are presented as example applications, one regarding the Expressive Interviewing system for COVID-19 and the other a semi-structured interview to survey public opinion on emerging neurotechnology. Our code is open-source, allowing others to build off of our work and develop extensions for additional functionality.
- North America > United States > Michigan (0.04)
- Europe > Ireland > Leinster > County Dublin > Dublin (0.04)
- Europe > Bulgaria (0.04)
- (9 more...)
- Research Report > Experimental Study (1.00)
- Questionnaire & Opinion Survey (1.00)
- Personal > Interview (1.00)
Survey-to-Behavior: Downstream Alignment of Human Values in LLMs via Survey Questions
Nie, Shangrui, Mai, Florian, Kaczér, David, Welch, Charles, Zhao, Zhixue, Flek, Lucie
Large language models implicitly encode preferences over human values, yet steering them often requires large training data. In this work, we investigate a simple approach: Can we reliably modify a model's value system in downstream behavior by training it to answer value survey questions accordingly? We first construct value profiles of several open-source LLMs by asking them to rate a series of value-related descriptions spanning 20 distinct human values, which we use as a baseline for subsequent experiments. We then investigate whether the value system of a model can be governed by fine-tuning on the value surveys. We evaluate the effect of finetuning on the model's behavior in two ways; first, we assess how answers change on in-domain, held-out survey questions. Second, we evaluate whether the model's behavior changes in out-of-domain settings (situational scenarios). To this end, we construct a contextualized moral judgment dataset based on Reddit posts and evaluate changes in the model's behavior in text-based adventure games. We demonstrate that our simple approach can not only change the model's answers to in-domain survey questions, but also produces substantial shifts (value alignment) in implicit downstream task behavior.
- Europe > Austria > Vienna (0.15)
- Europe > Bulgaria (0.04)
- North America > United States > Virginia > Williamsburg (0.04)
- (8 more...)
- Questionnaire & Opinion Survey (1.00)
- Research Report > New Finding (0.46)
- Media > News (0.48)
- Health & Medicine (0.46)
- Leisure & Entertainment > Games > Computer Games (0.34)
What's Behind the Magic? Audiences Seek Artistic Value in Generative AI's Contributions to a Live Dance Performance
Bruen, Jacqueline Elise, Jeon, Myounghoon
With the development of generative artificial intelligence (GenAI) tools to create art, stakeholders cannot come to an agreement on the value of these works. In this study we uncovered the mixed opinions surrounding art made by AI. We developed two versions of a dance performance augmented by technology either with or without GenAI. For each version we informed audiences of the performance's development either before or after a survey on their perceptions of the performance. There were thirty-nine participants (13 males, 26 female) divided between the four performances. Results demonstrated that individuals were more inclined to attribute artistic merit to works made by GenAI when they were unaware of its use. We present this case study as a call to address the importance of utilizing the social context and the users' interpretations of GenAI in shaping a technical explanation, leading to a greater discussion that can bridge gaps in understanding.
- North America > United States > Virginia > Montgomery County > Blacksburg (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- North America > Canada > Quebec > Montreal (0.04)
Measuring (a Sufficient) World Model in LLMs: A Variance Decomposition Framework
Kunievsky, Nadav, Evans, James A.
Understanding whether large language models (LLMs) possess a world model-a structured understanding of the world that supports generalization beyond surface-level patterns-is central to assessing their reliability, especially in high-stakes applications. We propose a formal framework for evaluating whether an LLM exhibits a sufficiently robust world model, defined as producing consistent outputs across semantically equivalent prompts while distinguishing between prompts that express different intents. We introduce a new evaluation approach to measure this that decomposes model response variability into three components: variability due to user purpose, user articulation, and model instability. An LLM with a strong world model should attribute most of the variability in its responses to changes in foundational purpose rather than superficial changes in articulation. This approach allows us to quantify how much of a model's behavior is semantically grounded rather than driven by model instability or alternative wording. We apply this framework to evaluate LLMs across diverse domains. Our results show how larger models attribute a greater share of output variability to changes in user purpose, indicating a more robust world model. This improvement is not uniform, however: larger models do not consistently outperform smaller ones across all domains, and their advantage in robustness is often modest. These findings highlight the importance of moving beyond accuracy-based benchmarks toward semantic diagnostics that more directly assess the structure and stability of a model's internal understanding of the world.
- North America > United States > California > San Francisco County > San Francisco (0.05)
- North America > United States > Illinois > Cook County > Chicago (0.04)
- Europe > France (0.04)
- (2 more...)
- Health & Medicine (0.68)
- Education (0.67)
- Transportation (0.46)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)
Redefining Research Crowdsourcing: Incorporating Human Feedback with LLM-Powered Digital Twins
Chan, Amanda, Di, Catherine, Rupertus, Joseph, Smith, Gary, Rao, Varun Nagaraj, Ribeiro, Manoel Horta, Monroy-Hernández, Andrés
Crowd work platforms like Amazon Mechanical Turk and Prolific are vital for research, yet workers' growing use of generative AI tools poses challenges. Researchers face compromised data validity as AI responses replace authentic human behavior, while workers risk diminished roles as AI automates tasks. To address this, we propose a hybrid framework using digital twins, personalized AI models that emulate workers' behaviors and preferences while keeping humans in the loop. We evaluate our system with an experiment (n=88 crowd workers) and in-depth interviews with crowd workers (n=5) and social science researchers (n=4). Our results suggest that digital twins may enhance productivity and reduce decision fatigue while maintaining response quality. Both researchers and workers emphasized the importance of transparency, ethical data use, and worker agency. By automating repetitive tasks and preserving human engagement for nuanced ones, digital twins may help balance scalability with authenticity.
- Asia > Japan > Honshū > Kantō > Kanagawa Prefecture > Yokohama (0.05)
- North America > United States > New Jersey > Mercer County > Princeton (0.05)
- North America > United States > New York > New York County > New York City (0.04)
- (3 more...)
- Research Report > New Finding (1.00)
- Questionnaire & Opinion Survey (1.00)
- Information Technology > Security & Privacy (1.00)
- Health & Medicine (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Issues > Social & Ethical Issues (0.93)
- Information Technology > Communications > Social Media > Crowdsourcing (0.71)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.48)